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Abstract 

We consider the problem of decentralized detection in a network consisting of a large number 
of nodes arranged as a tree of bounded height, under the assumption of conditionally independent, 
identically distributed observations. We characterize the optimal error exponent under a Neyman-Pearson 
formulation. We show that the Type II error probability decays exponentially fast with the number of 
nodes, and the optimal error exponent is often the same as that corresponding to a parallel configuration. 
We provide sufficient, as well as necessary, conditions for this to happen. For those networks satisfying 
the sufficient conditions, we propose a simple strategy that nearly achieves the optimal error exponent, 
and in which all non-leaf nodes need only send 1-bit messages. 

Index Terms 

Decentralized detection, error exponent, sensor networks. 

I. Introduction 

Most of the decentralized detection literature has been concerned with characterizing optimal 
detection strategies for particular sensor configurations; the comparison of the detection perfor- 
mance of different configurations is a rather unexplored area. We bridge this gap by considering 
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the asymptotic performance of bounded height tree networks. We analyze the dependence of the 
optimal error exponent on the network architecture, and characterize the optimal error exponent 
for a large class of tree networks. 

The problem of optimal decentralized detection has attracted a lot of interest over the last 
twenty-five years. Tenney and Sandell [1] are the first to consider a decentralized detection 
system in which each of several sensors makes an observation and sends a summary (e.g., 
using a quantizer or other "transmission function") to a fusion center. Such a system is to be 
contrasted to a centralized one, where the raw observations are transmitted directly to the fusion 
center. The framework introduced in [1] involves a "star topology" or "parallel configuration": 
the fusion center is regarded as the root of a tree, while the sensors are the leaves, directly 
connected to the root. Several pieces of work follow, e.g., [2]-[12], all of which study the 
parallel configuration under a Neyman-Pearson or Bayesian criterion. A common goal of these 
references is to characterize the optimal transmission function, where optimality usually refers 
to the minimization of the probability of error or some other cost function at the fusion center. 
A typical result is that under the assumption of (conditionally) independent sensor observations, 
likelihood ratio quantizers are optimal; see [6] for a summary of such results. 

The study of sensor networks other than the parallel configuration is initiated in [13], which 
considers a tandem configuration, as well as more general tree configurations, and character- 
izes optimal transmission strategies under a Bayesian formulation. Tree configurations are also 
discussed in [14]— [21], under various performance objectives. In all but the simplest cases, 
the exact form of optimal strategies in tree configurations is difficult to derive. Most of these 
references focus on person-by-person (PBP) optimality and obtain necessary, but not sufficient, 
conditions for an optimal strategy. When the transmission functions are assumed to be finite- 
alphabet quantizers, typical results establish that under a conditional independence assumption, 
likelihood ratio quantizers are PBP optimal. However, finding the optimal quantizer thresholds 
requires the solution of a nonlinear system of equations, with as many equations as there are 
thresholds. As a consequence, computing the optimal thresholds or characterizing the overall 
performance is hard, even for networks of moderate size. 

Because of these difficulties, the analysis and comparison of large sensor networks is ap- 
parently tractable only in an asymptotic regime that focuses on the rate of decay of the error 
probabilities as the number of sensors increases. For example, in the Neyman-Pearson framework, 
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one can focus on minimizing the error exponent 1 

g = limsup-log/9 n , 

n— »oo ft 

where f3 n is the Type II error probability at the fusion center and n is the number of sensors, 
while keeping the Type I error probability less than some given threshold. Note our convention 
that error exponents are negative numbers. The magnitude of the error exponent, \g\, is commonly 
referred to as the rate of decay of the Type II error probability. A larger \g\ would translate to a 
faster decay rate, hence a better detection performance. This problem has been studied in [22], 
for the case of a parallel configuration with a large number of sensors that receive independent, 
identically distributed (i.i.d.) observations. 

The asymptotic performance of another special configuration, involving n sensors arranged in 
tandem, has been studied in [23]-[25], under a Bayesian formulation. Necessary and sufficient 
conditions for the error probability to decrease to zero as n increases have been derived. However, 
even when the error probability decreases to zero, it apparently does so at a sub-exponential rate 
(see [26] for such a result for the Bayesian case). Accordingly, [25] argues that the tandem 
configuration is inefficient and suggests that as the number of sensors increases, the network 
"should expand more in a parallel than in [a] tandem" fashion. 

Even though the error probabilities in a parallel configuration decrease exponentially, the 
energy consumption of having each sensor transmit directly to the fusion center can be too high. 
The energy consumption can be reduced by setting up a directed spanning in-tree, rooted at the 
fusion center. In a tree configuration, each non-leaf node combines its own observation (if any) 
with the messages it has received and forms a new message, which it transmits to another node. 
In this way, information from each node is propagated along a multi-hop path to the fusion 
center, but the information is "degraded" along the way. For the case where observations are 
obtained only at the leaves, it is not hard to see that the detection performance of such a tree 
cannot be better than that of a parallel configuration with the same number of leaves. 

In this paper, we investigate the detection performance of a tree configuration under a Neyman- 
Pearson criterion. We restrict to trees with bounded height for two reasons. First, without a 
restriction on the height of the tree, performance can be poor (this is exemplified by tandem 

'Throughout this paper, log stands for the natural logarithm. 
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networks in which, as remarked above, the error probability seems to decay at a sub-exponential 
rate). Second, bounded height translates to a bound on the delay until information reaches the 
fusion center. 

As it is not apparent that the Type II error probability decays exponentially fast with the 
number of nodes in the network, we first show that under the bounded height assumption, 
exponential decay is possible. We then obtain the rather counterintuitive result that if leaves 
dominate (in the sense that asymptotically almost all nodes are leaves), then bounded height 
trees have the same asymptotic performance as the parallel configuration, even in non-trivial 
cases. (Such an equality is clear in some trivial cases, e.g., the configuration shown in Figure 
1, but is unexpected in general.) This result has important ramifications: a system designer can 
reduce the energy consumption in a network (e.g., by employing an h-hop spanning tree that 
minimizes the overall energy consumption), without losing detection efficiency, under certain 
conditions. 



Fig. 1. A tree network of height h, with n — h leaves. Its error probability is no larger than that of a parallel configuration 
with n — h leaves and a fusion center. If h is bounded while n increases, the optimal error exponent is the same as for a parallel 
configuration with n leaves. 

We also provide a strategy in which each non-leaf node sends only a 1-bit message, and 
which nearly achieves the same performance as the parallel configuration. These results are 
counterintuitive for the following reasons: 1) messages are compressed to only one bit at each 
non-leaf node so that "information" is lost along the way, whereas in the parallel configuration, 
no such compression occurs; 2) even though leaves dominate, there is no reason why the error 
exponent will be determined solely by the leaves. For example, our discussion in Section V-E 
indicates that without the bounded height assumption, or if a Bayesian framework is assumed 
instead of the Neyman-Pearson formulation, then a generic tree network (of height greater than 
1) performs strictly worse than a parallel configuration, even if leaves dominate. 

Finally, under a mild additional assumption on the allowed transmission functions, we find 
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that the sufficient conditions for achieving the same error exponent as a parallel configuration, 
are also necessary. 

The rest of this paper is organized as follows. In Section II, we present our model in detail. In 
Section III, we state the Ney man-Pears on problem, provide some motivating examples, and state 
the main results. In Section IV, we consider "relay trees," in which observations are only made 
at the leaves. In Section V, we prove the main results. Finally, in Section VI, we summarize 
and offer some concluding remarks. 

II. Problem Formulation 

In this section, we introduce the model and the required notation. We consider a decentralized 
binary detection problem involving n — 1 sensors and a fusion center; we will be interested 
in the case where n increases to infinity. We are given two probability spaces (fi,jF, P ) and 
(f2,jF, Pi), associated with two hypotheses H and Hi. We use Ej to denote the expectation 
operator with respect to Fj. Each sensor v observes a random variable X v taking values in some 
set X. Under either hypothesis Hj, j = 0, 1, the random variables X v are i.i.d., with marginal 
distribution Ff. 

A. Tree Networks 

The configuration of the sensor network is represented by a directed tree T n = (V n , E n ). Here, 
V n is the set of nodes, of cardinality n, and E n is the set of directed arcs of the tree. One of 
the nodes (the "root") represents the fusion center, and the remaining n — 1 nodes represent 
the remaining sensors. We will always use the special symbol / to denote the root of T n . We 
assume that the arcs are oriented so that they all point towards the fusion center. In the sequel, 
whenever we use the term "tree", we mean a directed, rooted tree as described above. 

We will use the terminology "sensor" and "node" interchangeably. Moreover, the fusion center 
/ will also be called a sensor, even though it plays the special role of fusing; whether the fusion 
center makes its own observation or not is irrelevant, since we are working in the large n regime, 
and we will assume it does not. 

We say that node u is a predecessor of node v if there exists a directed path from u to v . In 
this case, we also say that v is a successor of u. An immediate predecessor of node v is a node 
u such that (u,v) e E n . An immediate successor is similarly defined. Let the set of immediate 
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predecessors of v be C n (v). If v is a leaf, C n {v) is naturally defined to be empty. The length of 
a path is defined as the number of arcs in the path. The height of the tree T n is the length of 
the longest path from a leaf to the root, and will be denoted by h n . 

Since we are interested in asymptotically large values of n, we will consider a sequence of 
trees (T n ) n >i. While we could think of the sequence as representing the evolution of the network 
as sensors are added, we do not require the sequence E n to be an increasing sequence of sets; 
thus, the addition of a new sensor to T n may result in some edges being deleted and some new 
edges being added. We define the height of a sequence of trees to be h = sup n>1 h n . We are 
interested in tree sequences of bounded height, i.e., h < oo. 

Definition 1 (h-uniform tree): A tree T n is said to be /i-uniform if the length of every path 
from a leaf to the root is exactly h. A sequence of trees (T n ) n >i is said to be /i-uniform if there 
exists some n < oo, so that for all n > n , T n is /i-uniform. 

For a tree with height h, we say that a node is at level k if it is connected to the fusion center 
via a path of length h — k. Hence the fusion center / is at level h, while in an /i-uniform tree, 
all leaves are at level 0. 

Let l n (v) be the number of leaves of the sub-tree rooted at the node v. (These are the leaves 
whose path to / goes through v.) Thus, l n (f) is the total number of leaves. Let p n (v) be the 
total number of predecessors of v, i.e., the total number of nodes in the sub-tree rooted at v, not 
counting v itself. Thus, p n (f) = n — I. We let A n C V n be the set of nodes whose immediate 
predecessors include leaves of the tree T n . Finally, we let B n C A n be the set of nodes all of 
whose predecessors are leaves; see Figure 2. 




Fig. 2. Both nodes v and u belong to the set A n , but only node u belongs to the set B, 
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B. Strategies 

Given a tree T n , consider a node v ^ f. Node v receives messages Y u from every u G C n (v) 
(i.e., from its immediate predecessors). Node v then uses a transmission function 7^ to encode 
and transmit a summary 1^ = ^ V {X V , {Y u : w G C n (v )}) of its own observation X^,, and of the 
received messages {Y u : -u G C n (f )}, to its immediate successor. 2 We constrain all messages to 
be symbols in a fixed alphabet T. Thus, if the in-degree of v is \C n (v) \ = d, then the transmission 
function 7^ maps X x T d to T. Let r(<f) be a given set of transmission functions that the node 
v can choose from. In general, T(d) is a subset of the set of all possible mappings from X xT d 
to T. For example, T(d) is often assumed to be the set of quantizers whose outputs are the 
result of comparing likelihood ratios to some thresholds (cf. the definition of a Log-Likelihood 
Ratio Quantizer in Section III-B). For convenience, we denote the set of transmission functions 
for the leaves, T(0), by T. We assume that all transmissions are perfectly reliable. 

Consider now the root /, and suppose that it has d immediate predecessors. It receives 
messages from its immediate predecessors, and based on this information, it decides between 
the two hypotheses H and Hi, using a fusion rule 7/ : T d 1— > {0, l}. 3 Let Yf be a binary-valued 
random variable indicating the decision of the fusion center. 

We define a strategy for a tree T n , with n — 1 nodes and a fusion center, as a collection 
of transmission functions, one for each node, and a fusion rule. In some cases, we will be 
considering strategies in which only the leaves make observations; every other node v simply 
fuses the messages it has received, and forwards a message Y v = 7 t ,({F u : u G C n (v)}) to its 
immediate successor. A strategy of this type will be called a relay strategy. A tree network in 
which we restrict to relay strategies will be called a relay tree. If in addition, the alphabet T is 
binary, we will use the terms 1-bit relay strategy and 1-bit relay tree. Finally, in a relay tree, 
nodes other than the root and the leaves will be called relay nodes. 

2 To simplify the notation, we suppress the dependence of X v , Y v , j v , etc. on n. 

3 Recall that in centralized Neyman-Pearson detection, randomization can reduce the Type II error probability. Therefore, 
in general, the fusion center uses a randomized fusion rule to make its decision. Similarly, the transmission functions j v used 
by each node v, can also be randomized. We avoid any discussion of randomization to simplify the exposition, and because 
randomization is not required asymptotically, as will become apparent in Section V. 
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III. The Neyman-Pearson Problem 

In this section, we formulate the Neyman-Pearson decentralized detection problem in a tree 
network. We provide some motivating examples, and introduce our assumptions. Then, we give 
a summary of the main results. 

Given a tree T n , we require that the Type I error probability F (Yf = 1) be no more than a 
given a e (0, 1). A strategy is said to be admissible if it meets this constraint. We are interested 
in minimizing the Type II error probability Pi(V/ = 0). Accordingly, we define (3*(T n ) as 
the infimum of Pi(l/ = 0), over all admissible strategies. Similarly, we define f3* R (T n ) as the 
infimum of Pi(Y) = 0), over all admissible relay strategies. Typically, (3*(T n ) or l3* R (T n ) will 
converge to zero as n — > oo. We are interested in the question of whether such convergence 
takes place exponentially fast, and in the exact value of the Type II error exponent, defined by 

g* = limsup-log/3*(T n ), g R = limsup — ^— log(3 R (T n ). 

Note that in the relay case, we use the total number of leaves l n (f) instead of n in the definition 

of g* R . This is because only the leaves make observations and therefore, g* R measures the rate of 

error decay per observation. 

We denote the Kullback-Leibler (KL) divergence of two probability measures, P and Q, as 

r HFi 
D(P||Q) = E P log — 



where E p is the expectation operator with respect to (w.r.t.) P. Suppose that X is a sensor 
observation. For any 7 G T, let the distribution of 7pf) be Pj. Note that -D(PJ || P?) < 
< D(P7 II Pq), with both inequalities being strict as long as the measures Pq and P^ are not 
indistinguishable. 

In the classical case of a parallel configuration, with n — 1 leaves directly connected to the 
fusion center, the optimal error exponent, denoted as g* p , is given by [22] 

g* P = lim -log/r(T n ) = -supD(P2||Pi), (1) 

n->oo n 7g r 

under Assumptions 1-2, stated in Section III-B below. 

Our objective is to study g* and g* R for different sequences of trees. In particular, we wish 
to obtain bounds on these quantities, develop conditions under which they are strictly negative 
(indicating exponential decay of error probabilities), and develop conditions under which they 



March 16, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. INFORMATION THEORY 



9 



are equal to g* P . At this point, under Assumptions 1-2, we can record two relations that are 
always true: 

9p<9r, -D(F* < g* < zg* R , (2) 

where z = liminf l n (f)/n. The first inequality is true because all of the combining of messages 
that takes place in a relay network can be carried out internally, at the fusion center of a parallel 
network with the same number of leaves. The inequality — D(P^ || Pf ) < g* follows from the 
fact that — D(Pjf || Pf ) is the classical error exponent in a centralized system where all raw 
observations are transmitted directly to the the fusion center. Finally, the inequality g* < zg* R 
follows because an optimal strategy is at least as good as an optimal relay strategy; the factor 
of z arises because we have normalized g* R by l n (f) instead of n. 

For a sequence of trees of the form shown in Figure 1, it is easily seen that g* = g* R = g* P . In 
order to develop some insights into the problem, we now consider some less trivial examples. 

A. Motivating Examples 

In the following examples, we restrict to relay strategies for simplicity, i.e., we are interested 
in characterizing the error exponent g* R . However, most of our subsequent results hold without 
such a restriction, and similar statements can be made about the error exponent g* (cf. Theorem 
1). 

Example 1: Consider a 2-uniform sequence of trees, as shown in Figure 3, where each node 
Vi receives messages from m = (n — 3)/2 leaves (for simplicity, we assume that n is odd). 




Fig. 3. A 2-uniform tree with two relay nodes. 



Let us restrict to 1-bit relay strategies. Consider the fusion rule that declares H iff both v\ 
and v 2 send a 0. In order to keep the Type I error probability bounded by a, we view the 
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message by each v { as a local decision about the hypothesis, and require that its local Type I 
error probability be bounded by a/2. Furthermore, by viewing the sub-tree rooted at Vi as a 
parallel configuration, we can design strategies for each sub-tree so that 



lim - log Pi (n, = 0) = g* P . 

n-foo m 



(3) 



At the fusion center, the Type II error exponent is then given by 
lim - \og(3 n = lim -logPx^ = 0,Y V2 = 0) 

n-+oo n n^oo fl 

= I lim - log Pi (Y V1 = 0) + \ lim -logP^ = 0) 

I n-^oo 777, I n— >oo 777 

= 9p, 

where the last equality follows from (3). This shows that the Type II error probability falls 
exponentially and, more surprisingly, that g* R < g* P . In view of Eq. (2), we have g* R = g* p . 
It is not difficult to generalize this conclusion to all sequences of trees in which the number 
n — l n (f) — 1 of relay nodes is bounded. For such sequences, we will also see that g* = g* R (cf. 
Theorem l(iii)). □ 
Example 2: We now consider an example in which the number of relay nodes grows with n. 
In Figure 4, we let both m and iV be increasing functions of n (the total number of nodes), in 
a manner to be made explicit shortly. 




Fig. 4. A 2-uniform tree with a large number of relay nodes. 



Let us try to apply a similar argument as in Example 1, to see whether the optimal exponent 
of the parallel configuration can be achieved with a relay strategy, i.e., whether g* R = g* P . We 
let each node Vi use a local Neyman-Pearson test. We also let the fusion center declare H iff it 
receives a from all relay sensors. In order to have a hope of achieving the error exponent of 
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the parallel configuration, we need to choose the local Neyman-Pearson test at each relay so that 
its local Type II error exponent is close to g* P — — sup 7er D(Pq || P\). However, the associated 
local Type I error cannot fall faster than exponentially, so we can assume it is bounded below 
by 5exp(— me), for some 5, e > 0, and for all m large enough. In that case, the overall Type 
I error probability (at the fusion center) is at least 1 — (1 — 5e~ mt ) N . We then note that if 
N increases quickly with m (e.g., N = m m ), the Type I error probability approaches 1, and 
eventually exceeds a. Hence, we no longer have an admissible strategy. Thus, if there is a hope 
of achieving the optimal exponent g* P of the parallel configuration, a more complicated fusion 
rule will have to be used. □ 

Our subsequent results will establish that, similar to Example 1, the equalities g* = g* R — g* P 
also hold in Example 2. However, Example 2 shows that in order to achieve this optimal error 
exponent, we may need to employ nontrivial fusion rules at the fusion center (and for similar 
reasons at the relay nodes), and various thresholds will have to be properly tuned. The simplicity 
of the fusion rule in Example 1 is not representative. 

In our next example, the optimal error exponent is inferior (strictly larger) than that of a 
parallel configuration. 

Example 3: Consider a sequence of 1-bit relay trees with the structure shown in Figure 5. 
Let the observations X v at the leaves be i.i.d. Bernoulli random variables with parameter 1 — p 




Fig. 5. A 2-uniform tree, with m = i„(/)/2. 



under H , and parameter p under Hi, where 1/2 < p < 1. Note that 



9p 



En 



log 



p log - — - + (1 - p) log ■ 



p 1-p 

We can identify this relay tree with a parallel configuration involving m nodes, with each 
node receiving an independent observation distributed as r )(Xi,X 2 ). Note that we can restrict 
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the transmission function 7 to be the same for all nodes vi,...,v m [22], without loss of optimality. 
We have 



lim -\ogP*(T n ) = min V P ( 7 (X 1 , X 2 ) = j) log fM X ^ X *) j) 



n~*oo m 7Gr(2) 

j=0 



(4) 



To minimize the right-hand side (R.H.S.) of (4), we only need to consider a small number of 
choices for 7. If r )(Xi,X 2 ) — X-y, we are effectively removing half of the original 2m nodes, 
and the resulting error exponent is g* P /2, which is inferior to g* p . Suppose now that 7 is of the 
form 7(Xl, X 2 ) = iff X x = X 2 = 0. Then, it is easy to see, after some calculations (omitted), 
that 



lim -log^(T n )=^log^^ + (l-^)log- 



> 2(plog^^ + (1 -p) log-^-), 

V p 1 — p/ 



and 



lim 7T7T lo g^*( T n) >plog^ — - + (1 -P) log— ^— = 5-p- 



n— >oo 



Finally, we need to consider 7 of the form r y(X l ,X 2 ) — 1 iff Xi — X 2 — 1. A similar 
calculation (omitted) shows that the resulting error exponent is again inferior. We conclude that 
the relay network is strictly inferior to the parallel configuration, i.e., g* P < g* R . An explanation 
is provided by noting that this sequence of trees violates a necessary condition, developed in 
Section V-F for the optimal error exponent to be the same as that of a parallel configuration; 
see Theorem l(iv). □ 

A comparison of the results for the previous examples suggests that we have g* p = g* R 
(respectively, g* p < g* R ) whenever the degree of level 1 nodes increases (respectively, stays 
bounded) as n increases. That would still leave open the case of networks in which different 
level 1 nodes have different degrees, as in our next example. 

Example 4: Consider a sequence of 2-uniform trees of the form shown in Figure 6. Each 
node Vi, i = 1, m, has i + 1 leaves attached to it. We will see that the optimal error exponent 
is again the same as for a parallel configuration, i.e., g* R — g* = g* P . (cf. Theorem l(ii)). □ 

B. Assumptions 

In this subsection, we list our assumptions. Assumptions 1 and 2 are similar to the assumptions 
made in the study of the parallel configuration (see [22]). 
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Assumption 1: The measures Pjf and Pf are equivalent, i.e., they are absolutely continuous 
w.r.t. each other. Furthermore, there exists some 7 G T such that — D(Pq || Fj) < < D(P^ || Pq). 

r 2 cUP^ "I 

Assumption 2: E |_log -^x\ < 00. 
Assumption 2 implies the following lemma; see [22] for a proof. 
Lemma 1: There exists some a G (0, 00), such that for all 7 G T, 



log- 



log 



dPl 
dP? 



dP^ 



< E 



< a. 



x 



log' 



dp^ 

dlf 



+ 1 < a, 



Given an admissible strategy, and for each node v G V n , we consider the log-likelihood ratio 
of the distribution of Y v (the message sent by v) under H l9 w.r.t. its distribution under H , 

dpSt 

C v , n (y) =\og^^(y), 



dP 



0,n 



where dP^/dP^ is the Radon-Nikodym derivative of the distribution of Y v under Hi w.r.t. 
that under H . If Y„ takes values in a discrete set, then this is just the log-likelihood ratio 
log (Pi (Y v = y)/F (Y v = y)). For simplicity, we let L v>n = C v , n (Y v ) and define the log- 
likelihood ratio of the received messages at node v to be 



U(=C n (v) 

(Recall that C n (v ) is the set of immediate predecessors of v.) 

A (1-bit) Log-Likelihood Ratio Quantizer (LLRQ) with threshold t for a non-leaf node v, 
with |C„(f)| = d, is a binary-valued function on T d , defined by 

0, if x < t, 



LLRQ dt ({y u : u G C n (y)}) 



1, if x > t, 
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where 

x = TTv) ^ Cu ^ y ^- (5) 

By definition, a node v that uses a LLRQ ignores its own observation X v and acts as a relay. If 
all non-leaf nodes use a LLRQ, we have a special case of a relay strategy. We will assume that 
LLRQs are available choices of transmission functions for all non-leaf nodes. 
Assumption 3: For all t G M and d > 0, LLRQ d t G T(d). 

As already discussed (cf. Eq. (2)), the optimal performance of a relay tree is always dominated 
by that of a parallel configuration with the same number of leaves, i.e., g* p < g* R . In Section 
V, we find sufficient conditions under which the equality g* R = g* P holds. Then, in Section V-F, 
we look into necessary conditions for this to be the case. It turns out that non-trivial necessary 
conditions for the equality g* R = g* P to hold are, in general, difficult to obtain, because they 
depend on the nature of the transmission functions available to the sensors. For example, if the 
sensors are allowed to simply forward undistorted all of the messages that they receive, then 
the equality g* R = g* p holds trivially. Hence, we need to impose some restrictions on the set of 
transmission functions available, as in the assumption that follows. 

Assumption 4: 

(a) There exists a n > 1 such that for all n > n , we have l n (v) > 1 for all v in the set B n of 
nodes whose immediate predecessors are all leaves. 

(b) Let Xi, X 2 , ... be i.i.d. random variables under either hypothesis Hj, each with distribution 
Ff. For k > 1, 70 G T(k), and 7* G T, % = 1, . . . , k, let f = (70, . . . , 7*,). We also let v] be 
the distribution of 70(71(^1), • • • ,7fc(X fe )) under hypothesis Hj. We assume that 



g* P < inf -Eq log 



d^' 



(6) 



§er(fc)xr fc k 
for all k > 1. 

Assumption 4 holds in most cases of interest. Part (a) results in no loss of generality: if in 
a relay tree we have l n (v) = 1 for some v G B n , we can remove the predecessor of v, and 
treat v as a leaf. Regarding part (b), it is easy to see that the left-hand side (L.H.S.) of (6) is 
always less than or equal to the R.H.S., hence we have only excluded those cases where (6) 
holds with equality. We are essentially assuming that when the messages 7 1 (X L ), . . . ,^y k (X k ) 
are summarized (or quantized) by 70, there is some loss of information, as measured by the 
associated KL divergences. 
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C. Main Results 

In this section, we collect and summarize our main results. The asymptotic proportion of 
nodes that are leaves, defined by 

z = lim inf " ^ , 

plays a critical role. 

Theorem 1: Consider a sequence of trees, (T n ) n >i, of bounded height. Suppose that Assump- 
tions 1-3 hold. Then, 

(i) 9p < g* R < and -D(Pj* || Pf ) < g* < zg* R < 0. 

(ii) If z = 1, then g* P = g* = g* R . 

(iii) If the number of non-leaf nodes is bounded, or if mm veBn l n (v) — > oo, then g* p = g* = g* R . 

(iv) If Assumption 4 also holds, we have g* R — g* p iff z — 1. 

Note that part (i) follows from (2), except for the strict negativity of the error exponents, 
which is established in Proposition 2. Part (ii) is proved in Proposition 3. Part (iii) is proved in 
Corollary 1. (Recall that B n is the set of non-leaf nodes all of whose immediate predecessors 
are leaves.) Part (iv) is proved in Proposition 5. One might also have expected a result asserting 
that g* p < g*. However, this is not true without additional assumptions, as will be discussed in 
Section V-F. 

IV. Error Bounds for /i-Uniform Relay Trees 

In this section, we consider a 1-bit /i-uniform relay tree, in which all relay nodes at level 
k use a LLRQ with a common threshold t k . We wish to develop upper bounds for the error 
probabilities at the various nodes. We do this recursively, by moving along the levels of the tree, 
starting from the leaves. Given bounds on the error probabilities associated with the messages 
received by a node, we develop a bound on the log-moment generating function at that node (cf. 
Eq. (8)), and then use the standard Chernoff bound technique to develop a bound on the error 
probability for the message sent by that node (cf. Eq. (7)). 

Let = (t 1: t 2 , ■ ■ ■ , t k ), for k > 1, and = 0. For j = 0, 1, k > 1, and A G R, we define 
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recursively 



A J -o(7;A) = A J -o(7,0;A) = logE j 



A* fc (7,t (fe) ) = sup {\t k - A J - fc _ 1 ( 7 ,^- 1 ); A)}, (7) 
AeR 

A,- fc ( 7 ,f<*>; A) = max{ - A; >fc ( 7 , *<*>)(,- + A), A^( 7 , t^)(j - 1 + A)}. (8) 

The operation in (7) is known as the Fenchel-Legendre transform of A J - )fc _ 1 (7, t^ k ~^; A) [27]. 
We will be interested in the case where 



D(PJ||P3') < < D(P? 



he (-D(¥2\\¥j)Mn\\n)), 

t k e (-A^_ 1 ( 7 ,^- 1 )),A* )fc _ 1 ( 7 ,^- 1 ))), for 1< k < h. 



(9) 
(10) 
(11) 



We now provide an inductive argument to show that the above requirements on the thresholds 
tk are feasible. From Assumption 1, there exists a 7 E T that satisfies (9), hence the constraint 
(10) is feasible. Furthermore, the A* ^(7, t^) are large deviations rate functions and are therefore 
positive when t\ satisfies (10) [27]. Suppose now that k > 1 and that A* fc _ 1 (7, t^ k ~^) > 0. From 
(8), Aj ;fe _i(7, t^ -1 ); A) is the maximum of two linear functions of A (see Figure 7). Taking the 
Fenchel-Legendre transform, and since t k satisfies (11), we obtain A£ fc ( 7 , t^) > 0, which 
completes the induction. 



Slope=-A^_ 1 ( 7! t("- 1 )) 



Slope=A5, t _ 1 ( 7 ,t('=- 1 )) 




^ Slope=tfc 



Fig. 7. Typical plot of Ao,fc-i(7,i (fc_1) ; A), k > 2. 
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From the definitions of A j fc and A* k , the following relations can be established. The proof 
consists of straightforward algebraic manipulations and is omitted. 

Lemma 2: Suppose that 7 G T satisfies (9), and satisfies (10)-(11). For k > 1, we have 

A^(7,t (fc) ) = AS, fc (7,t (fc) )-4. 

Furthermore, the supremum in (7) is achieved at some A G (—1, 0) for j = 1, and A G (0, 1) for 
j = 0. For k > 2, we have 



AU(7,t (fc) ) 
Ao,,(7,t (fc) ) 



Kk-iW^) + A* )fc _ 1 (7,#- 1 )) 
An fc _ 1 (7,^- 1) )(At fe _ 1 ( 7 ,t (fe - 1) )+4) 



A^-i(7,^" 1 ))+A* ifc _ 1 ( 7 ,t(^)) • 
Proposition 1 below, whose proof is provided in the Appendix, will be our main tool in 

obtaining upper bounds on error probabilities. It shows that the Type I and II error exponents 

are essentially upper bounded by — Aq ^(7, t^) and — AJ ^(7, t^) respectively. Recall that p n (v ) 

is the total number of predecessors of v, l n (v) is the number of leaves in the sub-tree rooted at 

v, and B n is the set of nodes all of whose immediate predecessors are leaves. 

Proposition 1: Fix some h > 1, and consider a sequence of trees (T n ) n >i such that for all 
n > n , T n is /i-uniform. Suppose that Assumptions 1-2 hold. Suppose that, for every n, every 
leaf uses the same transmission function 7 G T, which satisfies (9), and that every level k node 
(k > 1) uses a LLRQ with threshold t k , satisfying (10)-(11). 

(i) For all nodes v of level k > 1 and for all n > n , we have 

7 | T iogp 1 (f^< tt )<-A;, t( 7,a + ^ 

l n (v) \l n {v) / Iniv) 



- 1, 



.^(^>,),_ AU( ^. ) + ^_ l . 



W ° U \L(V) V " ^" ' l n ( V ) 

(ii) Suppose that for all n> n and all i> G -B n , we have l n (v) > N. Then, for all n > n , we 
have 



J_ logPl( |m, ti) ,_ A . i(7 , tW) + ^ 
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V. Optimal Error Exponent 

In this section, we show that the Type II error probability in a sequence of bounded height 
trees falls exponentially fast with the number of nodes. We derive sufficient conditions for the 
error exponent to be the same as that of a parallel configuration. We show that if almost all of 
the nodes are leaves, i.e., z — 1, then g* P = g* = g* R . The condition z — 1 is also equivalent 
to another condition that requires that the proportion of leaves attached to bounded degree 
nodes vanishes asymptotically. We also show that under some additional mild assumptions, this 
sufficient condition is necessary. We start with some graph-theoretic preliminaries. 

A. Properties of Trees. 

In this section, we define various quantities associated with a tree, and derive a few elementary 
relations that will be used later. 

Recall that B n is the set of non-leaf nodes all of whose predecessors are leaves. (For an 
/i-uniform tree, B n is the set of all level 1 nodes.) For N > 0, let 

F N ,n = {vEB n : l n (v) < N}, F%, n = {v G B n : l n {v) > N}, (12) 

and 

'tl / ) „ c p 

where the sum is taken to be zero if the set Fn, u is empty. Let = lim sup qN,n- For a sequence 

n—MX> 

of /i-uniform trees, this is the asymptotic proportion of leaves that belong to "small" subtrees 
in the network. 

It turns out that it is easier to work with /i-uniform trees. For this reason, we show how to 
transform any tree of height h to an /i-uniform tree. 

Height Uniformization Procedure. Consider a tree T n = (V n , E n ) of height h, and a node v 
that has at least one leaf as an immediate predecessor (v G A n ). Let D n be the set of leaves that 
are immediate predecessors of v, and whose paths to the fusion center / are of length k < h. 
Add h — k nodes, {uj : j = 1, . . . , h — k}, to V n ; remove the edges (u,v), for all u G D n ; 
add the edges (u\, v), and (uj +1: uj), for j = 1, . . . , h — k — 1; add the edges (u, u h - k ), for all 
u G D n . This procedure is repeated for all v G A n . The resulting tree is /i-uniform. □ 
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The height uniformization procedure essentially adds more nodes to the network, and re- 
attaches some leaves, so that the path from every leaf has exactly h hops. Let (T^ = (V^, E' n )) n >i 
be the new sequence of /i-uniform trees obtained from (T n )„>i, after applying the uniformization 
procedure. (We are abusing notation here in that T' n typically does not have n nodes, nor is the 
sequence |V^| increasing.) Regarding notation, we adopt the convention that quantities marked 
with a prime are defined with respect to V n . 

Note that l' n (f) = l n (f)- For the case of a relay network, it is seen that any function of the 
observations at the leaves that can be computed in T' n can also be computed in T n . Thus, the 
detection performance of T' n is no better than that of T n . Hence, we obtain 

^<limsup-i-log/3*(T:). (14) 

n-+oo <> n {J ) 

Therefore, any upper bound derived for /i-uniform trees, readily translates to an upper bound for 
general trees. On the other hand, the coefficients q N for the /i-uniform trees T' n (to be denoted 
by q' N ) are different from the coefficients qN for the original sequence T n . They are related as 
follows. The proof is given in the Appendix. 
Lemma 3: For any N, M > 0, we have 

q' N <h(Nq M + N/M). 

In particular, if q N = for all N > 0, then q' N = for all N > 0. 

It turns out that the condition z — 1 is equivalent to the condition q N = for all N > 0. The 
proof is provided in the Appendix. 

Lemma 4: We have z = 1 iff qN = for all N > 0. 

B. An Upper Bound 

In this section, we develop an upper bound on the Type II error probabilities, which takes 
into account some qualitative properties of the sequence of trees, as captured by qN- 

Lemma 5: Consider an /i-uniform sequence of trees (T n ) n > l5 and suppose that Assumptions 
1-3 hold. For every e > 0, there exists some iV such that 

9R<{l-q N ){9p + e). 
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Proof: If g* P + e > 0, there is nothing to prove, since q N < 1 and g* R < 0. Suppose that 
g* P + e < 0. Choose 7 e T such that 

-D(Pg || PI) < - sup D(P2f || P?') + £ = ,£ + £< 0. 

7'er ^ ^ 

Let t k = t = -D(PJ || P?) + e/2 < + e, for k = 1, . . . , /i, and note that 

-D(P3[ || P?) < i < 0. (15) 

Because of (15), we have Aq^,^ 1 )) > 0. Furthermore, using Lemma 2, ^(7, t* 1 )) = 
A o,i(7>* (1) ) - * > Now let k > 2, and suppose that A* jfc _ 1 (7, t (fc_1) ) > -t and A* fc _ 1 (7, # -1 )) > 
0. From Lemma 2, 

° M7, J ^(7,^)) +^(7,^-1)) >U ' 

and 

A; >fc (7, t (k) ) = AS, fc (7, t (fe) ) - t fc = A* )fe ( 7 , t (fc) ) - t > -t. 

Hence, by induction, t k satisfies (10)-(11), so that Proposition 1 can be applied. 

Choose A^ sufficiently large so that h/N < Aq ^(7, ^). If q N = 1, the claimed result holds 
trivially. Hence, we assume that q N e [0, 1). In this case, for n sufficiently large, there exists 
at least one node in B n so that l n (v) > N. We remove all nodes v £ B n with l n (v) < N, 
and their immediate predecessors. Then, we remove all level 2 nodes v that no longer have any 
predecessors, and so on. In this way, we obtain an /i-uniform subtree of T n , to be denoted by 
T". (Quantities marked with double primes are defined w.r.t. T''.) We have l'n(v) > N for all 
v E B" n , and l" n {f) = J2 veF c l n (y) = Z n (/)(1 - q N , n ). Consider the following relay strategy on 

N,n 

the tree T£. (Since this is a subtree of T n , this is also a relay strategy for the tree T n , with some 
nodes remaining idle.) The leaves transmit with transmission function 7, and the other nodes use 
a 1-bit LLRQ with threshold t. (Note that in the definition (5) of the normalized log-likelihood 
ratio, the denominator l n (v) now becomes l' r [(v).) 
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We first show that the strategy just described is admissible. We apply part (ii) of Proposition 
1 to T", to obtain 

limsup— !— logPo(Y) = 1) 

n— »oo <"a\J ) 

£(1 -^ )li ^ p W) logP °(|i >i ) 

<(i-!»)(-*y7,'"')4) <o - 

hence Po(l/ = 1) < a, when n is sufficiently large. 

To bound the Type II error probability, we use Proposition 1 and Lemma 2, to obtain 

^<limsup-i-log^ra 

n— >oo <"n{J ) 

h 



<(l-q N )(-Al, h (l,t^) + -) 

= (l-qN)(t-AUl,t (h) ) + ^) 

< (l-q N )t 

< (l-q N ){9*p + e). 

This proves the lemma. □ 

C. Exponential decay of error probabilities 

We now establish that Type II error probabilities decay exponentially. The bounded height as- 
sumption is crucial for this result. Indeed, for the case of a tandem configuration, the exponential 
decay property does not seem to hold. 

Proposition 2: Consider a sequence of trees of height h, and let Assumptions 1-3 hold. Then, 

-oo < g* P < g* R < and - oo < -D(P* || Pf ) < g* < 0. 
Proof: The lower bounds on g* R and g* follow from (2). Note that g* p cannot be equal to 
— oo because it cannot be better than the error exponent of a parallel configuration in which all 
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the observations are provided uncompressed to the fusion center. The error exponent in the latter 
case is — D(Pjf || Pf ), by Stein's Lemma, and is finite as a consequence of Assumption 2. 

It remains to show that the optimal error exponents are negative. Every tree of height h satisfies 
n < l n (f)h+ 1. From (2), we obtain g* < g* R /h. Therefore, we only need to show that g* R < 0. 
As discussed in connection to (14), we can restrict attention to a sequence of /i-uniform trees. 

We use induction on h. If h = 1, we have a parallel configuration and the result follows from 
[22]. Suppose that the result is true for all sequences of (h — l)-uniform trees. Consider now a 
sequence of /i-uniform trees. Let e > be such that g* p + e < 0. From Lemma 5, there exists 
some N such that g* R < (1 — qN)(g*p + e). If Qn < L we readily obtain the inequality g* R < 0. 

Suppose now that q N — 1. We only need to consider a sequence {n k ) k >i such that lim qN,n k = 

k—*oo ' 

1. Using the inequality (22), we have 

\FN,n k \ > QN,n k 



and 

li min fL^4>l. (16) 

fc-oo l nk (f) ~ N 

For each node v £ B n , we remove all of its immediate predecessors (leaves) except for one, 
call it u. The leaf u transmits 7(X U ) to its immediate successor v. Since node v receives only a 
single message, it just forwards it to its immediate successor. The resulting performance is the 
same as if the nodes v in B n were making a measurement X v and transmitting ^(Xy) to their 
successor. This is equivalent to deleting all the leaves of T n to form a new tree, T", which is 
(h - l)-uniform. The above argument shows that (3*{T nk ) < P*{T^). 

We have l" rik (f) = \B nk \ and from (16), 



liminf ^" fc | > liminf ^ N ' nk j > —. 
ln k (f) k ^°° M/) N 



Therefore, 



lim sup log (3* (T nk ) < 1 lim sup log (3* (T^ ) . 

By the induction hypothesis, the right-hand side in the above inequality is negative and the proof 
is complete. □ 
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D. Sufficient Conditions for Matching the Performance of the Parallel Configuration 

We are now ready to prove the main result of this section. It shows that when q N = for 
all N > 0, or equivalently when z — 1 (cf. Lemma 4), bounded height tree networks match the 
performance of the parallel configuration. 

Proposition 3: Consider a sequence of trees of height h in which z = 1, or equivalently 
q N = for all N > 0. Suppose that Assumptions 1-3 hold. Then, 

* * * 

9p = 9 = 9r- 

Furthermore, if the sequence of trees is /i-uniform, the optimal error exponent does not change 
even if we restrict to relay strategies in which every leaf uses the same transmission function 
and all other nodes use a 1-bit LLRQ with the same threshold. 

Proof: We have shown gp < g* R in (2). We now prove that g* R < g* p . As already explained, 
there is no loss in generality in assuming that the sequence of trees is /i-uniform (by performing 
the height uniformization procedure, and using Lemma 3). 
For any e > 0, Lemma 5 yields 

9 R < 9p + e. 

Letting e — > 0, we obtain g* R < g* p , hence g* R — g* P . From (2) with z — 1, we obtain g* < g* R — 
9*p. 

We now show that g* > g* p . Consider a tree with n nodes, l n (f) of which are leaves. We will 
compare it with another sensor network in which l n (f) nodes v transmit a message r ) v {X v ) to 
the fusion center and n — /„(/) — 1 nodes transmit their raw observations to the fusion center. 
The latter network can simulate the original network, and therefore its optimal error exponent is 
at least as good. By a standard argument (similar to the one in Proposition 4 below), the optimal 
error exponent in the latter network can be shown to be greater than or equal to 

,. ln\f) * . ,. n ~~ ln{f) — 1 r^/iaX II n X\ * 

hmsup g P + hmsup D(P || F ± ) — g p , 

n— +oo 71 n—*oo Tl 

hence concluding the proof. □ 
Fix an e G (0, —gp). For any tree sequence with z— 1, we can perform the height uniformiza- 
tion procedure to obtain an /i-uniform sequence of trees. In practice, this height uniformization 
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procedure may be performed virtually at each node, so that the tree sequence simulates a h- 
uniform tree sequence. A simple strategy on the height uniformized tree sequence that e-achieves 
the optimal error exponent is a relay strategy in which: 

(i) all leaves transmit with the same transmission function 7 e T, where 7 is chosen such that 
-D(P2||PT) <g* P + e/2; 

(ii) all other nodes use 1-bit LLRQs with the same threshold t = -D(P^ || Fj) + e/2. 
Lemmas 3 and 4, and the proof of Lemma 5 shows that this relay strategy e-achieves the optimal 
error exponent g* R = g* = g* P . This also shows that there is no loss in optimality even if we 
restrict the relay nodes to use only 1-bit LLRQs. This may be useful in situations where the 
nodes are simple, low-cost devices. 

Proposition 3 provides sufficient conditions for a sequence of trees to achieve the same error 
exponent as the parallel configuration. We note a few special cases in which these sufficient 
conditions are satisfied. The first one is the case where there is a finite bound on the number 
of nodes that are not leaves. In that case, z is easily seen to be 1. This is consistent with the 
conclusion of Example 1, where a simpler argument was used. The second is the more general 
case where nodes in B n are attached to a growing number of leaves, which implies that q N = 
for all N > 0. 

Corollary 1: Suppose that Assumptions 1-3 hold. Suppose further that either of the following 
conditions holds: 

(i) There is a finite bound on the number of nodes that are not leaves. 

(ii) We have min„ eBn l n (v) — > 00. 
Then, g* P = g* = g* R . 

The above corollary can be applied to Example 2. In that example, every level 1 node has m 
leaves attached to it, with m growing large as n increases. Therefore, the tree network satisfies 
condition (ii) in Corollary 1, and the optimal error exponent is g* = g* R = g* P . In this case, even 
if the number N of level 1 nodes grows much faster than m, we still achieve the same error 
exponent as the parallel configuration. The above proposed strategy, in which every leaf uses 
the same transmission function, and every node uses the same LLRQ, will nearly achieve the 
optimal performance. 

We are now in a position to determine the optimal error exponent in Example 4. 
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Example 4, revisited: Recall that in Example 4, every Vi £ B n has % + 1 of predecessors. It is 
easy to check that z — 1. From Proposition 3, the optimal error exponent is the same as that for 
the parallel configuration. □ 

E. Discussion of the Sufficient Conditions 

Proposition 3 is unexpected as it establishes that the performance of a tree possessing certain 
qualitative properties is comparable to that of the parallel configuration. Furthermore, the optimal 
performance is obtained even if we restrict the non-leaf nodes to use 1-bit LLRQs. At first sight, 
it might appear intuitive that if the leaves dominate in a relay tree (z = 1), then the tree 
should always have the same performance as a parallel configuration. However, this intuition is 
misleading, as this is not the case for a Bayesian formulation, in which both the Type I and II 
error probabilities are required to decay at the same rate, is involved. To see this, consider the 
2-uniform tree in Figure 3, where every node is constrained to sending 1-bit messages. Suppose 
we are given nonzero prior probabilities n and 7Ti for the hypotheses H and Hi. Instead of 
the Neyman-Pearson criterion, suppose that we are interested in minimizing the error exponent 

limsup— i—logP e *, 

n^oo <"n\J ) 

where P* is the minimum of the error probability 7r P (Y) = 1) + 7r 1 P 1 (Fj = 0), optimized 
over all strategies. It can be shown that to obtain the optimal error exponent, we only need to 
consider the following two fusion rules: (a) the fusion center declares H iff both v\ and v 2 send 
a 0, or (b) the fusion center declares Hi iff both vi and v 2 send a 1 . Then, using the results in 
[28], the optimal error exponent for this tree network is strictly worse than that for the parallel 
configuration. Similarly, if we constrain the Type I error in the Neyman-Pearson criterion to 
decay faster than a predetermined rate, it can be shown that the optimal Type II error exponent 
for a tree network can be strictly worse than that of a parallel configuration. 

Note that the bounded height assumption is essential in proving g* — g* R — g* P , when z — 1. 
Although our technique can be extended to include those tree sequences whose height grows 
very slowly compared to n (on the order of log | log(n/l n (f) — 1)|), we have not been able to 
find the optimal error exponent for the general case of unbounded height. As noted before, in a 
tandem network, the Bayesian error probability decays sub-exponentially fast [26]. The proof of 
Proposition 2 in [26] involves the construction of a tree network, with unbounded height, and 



March 16, 2008 



DRAFT 



SUBMITTED TO IEEE TRANS. INFORMATION THEORY 



26 



in which z — 1. In that proof, it is also shown that such a network has a sub-exponential rate 
of error decay. We conjecture that this is also the case for the Neyman-Pearson formulation. 

In summary, for a tree network to achieve the same Type II error exponent as a parallel 
configuration, we require that the tree sequence have a bounded height, satisfy the condition 
z — 1, and that the error criterion be the Neyman-Pearson criterion. Without any one of these 
three conditions, our results no longer hold. 

F. A Necessary Condition for Matching the Performance of the Parallel Configuration 

In this section, we establish necessary conditions under which a sequence of relay trees with 
bounded height performs as well as a parallel configuration. As noted in Section III-B, any 
necessary conditions generally depend on the type of transmission functions available to the 
relay nodes. However, under an additional condition (Assumption 4), the sufficient condition for 
9*r = 9p m Proposition 3 is also necessary. 

Proposition 4: Suppose that Assumptions 1, 2 and 4 hold, and h > 2. If there exists some 
N > such that q N > (equivalently, z < 1), then g* p < g* R . 

Proof: Fix some N > and suppose that q N > 0. Given a tree T n , we construct a new 
tree T", as follows. We remove all nodes other than the leaves and the nodes in F N , n . For all the 
leaves u that are not immediate predecessors of some v G F NjJl , we let u transmit its message 
directly to the fusion center. We add new edges (v, f), for each v E F NjTl . This gives us a tree 
T" of height 2, with l'^(f) = l n (f) and q'^ = q N . The latter tree T% can simulate the tree T n , 
hence the optimal error exponent associated with the sequence (T n ) n >x is bounded below by 
the optimal error exponent associated with the sequence (T^) n >!. Therefore, without loss of 
generality, we only need to prove the proposition for a sequence of trees of height 2, and in 
which F N>n = B n , for some N > such that q N > 0; we henceforth assume that this is the 
case. The rest of the argument is similar to the proof of Stein's Lemma in Lemma 3.4.7 of [27]. 
Suppose that a particular admissible relay strategy has been fixed, and let (3 n be the associated 
Type II error probability. Let A n = E [S n (f)]/l n (f). We show that S n (f)/l n (f) is close to X n 
in probability. Let D n be the set of leaves that transmit directly to the fusion center. The proof 
of the following lemma is in the Appendix. 

Lemma 6: For all rj > 0, ^o(\S n (f)/ln(f) — A„| > rj) — > 0, as n — > oo. 
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We return to the proof of Proposition 4. Given the transmission functions at all other nodes, 
the fusion center will optimize performance by using an appropriate likelihood ratio test, with a 
(possibly randomized) threshold. We can therefore assume, without loss of generality that this 
is the case. We let ( n be the threshold chosen, and note that it must satisfy 

P (S n (f)/l n (f) < Q > 1 - Oi. (17) 

From a change of measure argument (see Lemma 3.4.7 in [27]), we have for rj > 0, 

1 



iog/r(T n ) 



W) 

>A„-, + ^lo g P (A„-,<^<4 

Using (17) and Lemma 6, we see that the last term goes to as n — > oo.We also have 

1 / dP 7 " \ 

nyj ' vdD n veF N: „ 

> (1 - qN,n)g*P + QN,nK, 

where, using the notation in Assumption 4, 



K = inf -E 

Kk<N k 

£er(fc)xr fc 



i &4 



> 9p- 



Then, letting n — > oo, we have 

9* R > (1 - qN)g P + QnK - r], 

for all rj > 0. Taking r] — > completes the proof. □ 
The condition that there exists a finite iV such that Z„(t> ) < iV for a non- vanishing proportion 
of nodes, in the statement of Proposition 4, can be thought of as corresponding to a situation 
where relay nodes are of two different types: high cost relays that can process a large number 
of received messages (l n (v ) — > oo) and low cost relays that can only process a limited number 
of received messages {l n (v ) < N for some small N). From this perspective, Proposition 4 states 
that a tree network of height greater than one, with a nontrivial proportion of low cost relays, 
will always have a performance worse than that of a parallel configuration. 
Together with Proposition 3, we have shown the following. 

Proposition 5: Suppose that Assumptions 1-4 hold. Then, g* R — gp iff z — 1 (or equivalently, 
iff q N = for all N > 0). 
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We close with an example in which z < 1 and g* < g* p . Since there are also easy examples 
where z < 1 and g* P < g*, this suggests that one can combine them to construct examples where 
z < 1 and g* = g* P . Thus, unlike the case of a relay tree, z — 1 is not a necessary condition for 



Example 5: Consider the tree network shown in Figure 8, where every node makes a 3-bit 
observation. Each leaf then compresses its 3-bit observation to a 1-bit message, while each level 
1 node is allowed to send a 4-bit message. (Recall that our framework allows for different 
transmission function sets T(d) at the different levels.) We assume Assumptions 1-3 hold. 
Moreover, we assume that this network satisfies Assumption 4. 



Fig. 8. Every node makes a 3-bit observation. Leaves are constrained to sending 1-bit messages, while level 1 nodes are 
constrained to sending 4-bit messages. 

Consider the following strategy: each level 1 node forwards the two 1-bit messages it receives 
from its two leaves to the fusion center. It then compress its own 3-bit observation into a 
2-bit message before sending it to the fusion center. Using this strategy, the tree network is 
equivalent to a parallel configuration with 3m nodes, 2 m of which are constrained to sending 
1-bit messages, and m of which are constrained to sending 2-bit messages. Clearly, this parallel 
configuration performs strictly better than one in which all 3m nodes are constrained to sending 
1-bit messages, therefore we have g* < g* p . □ 

Example 5 shows that, unlike the case of relay trees, a tree can outperform a parallel con- 
figuration. On the other hand, Example 5 is an artifact of our assumptions. For example, if we 
restrict every node in this example to sending only 1 bit, the situation is reversed and we have 
g* p < g*. The question of whether a parallel configuration always performs at least as well as a 
tree network, i.e., whether g* P < g*, when every node can send the same number of bits, remains 
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open. 

VI. Conclusion 

We have studied the asymptotic detection performance of tree networks with bounded height, 
under a Neyman-Pearson criterion. Similar to the parallel configuration, we have shown that 
the optimal Type II error probability decays exponentially fast with the number of nodes. In 
addition, we have shown that if leaves dominate (i.e., l n (f)/n — > 1), the network can achieve 
the same performance as if all nodes were transmitting directly to the fusion center. We also 
provided a simple strategy, in which all leaves use the same transmission function, and all other 
nodes act as 1-bit relays, which achieves the optimal error exponent to any desired accuracy. The 
sufficient conditions are easy to achieve in cases of practical interest, hence a system designer 
can obtain the optimal performance while ensuring that the network is energy efficient. Once the 
sufficient conditions are satisfied, the architecture of the network no longer affects its detection 
error exponent. On the other hand, we also showed that for the practically interesting case where 
z — 1, the sufficient conditions are also necessary. Thus, in a network where the leaves do not 
dominate, the error decay rate will be worse than that of a parallel configuration, and will actually 
depend on the particular network architecture. 

Needless to say, our conclusions only hold for the particular setting and criterion we have 
employed. One issue that has not been touched upon is that, with a relay network, a significantly 
larger value of n may be required before the asymptotic error exponent yields a good approxi- 
mation. Moreover, in practice, it would be wasteful to have only the leaves make observations, 
if n is not large enough. Furthermore, under a Bayesian criterion, the same performance as the 
parallel configuration can no longer be achieved, although exponential decay is still possible 
[28]. Finally, the more realistic case where the i.i.d. assumption is violated, remains unexplored, 
with work mainly limited to the parallel configuration [29]-[34]. 

Future work includes characterizing the asymptotically optimal performance of tree networks 
without the bounded height constraint. We would like to understand the rate at which the error 
probability decays, and its dependence on the rate at which the height of the tree increases. 
Another intriguing question, which has been left unanswered, is whether the inequality g* p < g* 
is always true under the bounded height assumption, when every node is constrained to sending 
the same number of bits. 
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Appendix 

A. Proof of Proposition 1 

We first show part (i). The proof proceeds by induction on k. Suppose that k — 1, which is 
equivalent to the well-studied case where all sensors transmit directly to a fusion center. In this 
case, p n (v) = l n (v). Since h e (-D(Pg || ¥j),D(Fj || Pg)), from (2.2.13) of [27], we obtain 

T^JogFJ^ < tl ) K-A^t,). 

The inequality for the Type I error probability follows from a similar argument. 

Consider now the induction hypothesis that the result holds for some k. Given a /c-uniform 
tree rooted at v, the induction hypothesis leads to bounds on the probabilities associated with 
the log-likelihood ratio L v , n of the message Y v computed at the node v. We use these bounds to 
obtain bounds on the log-moment generating function of L v>n . Recall that L v , n equals C v . n (0) 
whenever Y v = 0, which is the case if and only if S n (v)/l n (v) < t k . Fix some A e [—1,0]. We 
have 

1 



ln{v) 



log Ei [e XLv ' n ] 



1 



< 



l n {v) 
1 

ln{v) 
1 



log Fi(Y v = 0)e XCv ' n(0) + Pi(y; = l)e XCv - n 



(i) 



log Px(y w = o) 1+A p (n = o)- A + f 1 (y v = i) 1+A p (i; = i) 



1+At 



log 



Pi(y„ = o) 1+A + p (y; = 1)- 



Using the inequality log (a + b) < max{log(2a), log(26)}, we obtain 
1 



ln(v) 



log Ei [e XLv ' n ] 



< max 



A 



A 



{ W 1OgPl(F ^ 0) '-/>) 1OgP0(F ^ 1)} + I 



log 2 



< max{ - (1 + A)At,( 7 ,t (fc) ), AA* ifc ( 7 ,t (fc) )} + 



Pn{V) 

l n (v) 



- 1 + 



log 2 

l n (v) 



<A 1)fe ( 7 ,^);A) + 



Pn{V) 



+ 



ln(v) l n (v) 



- 1, 



(18) 
(19) 
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where (18) follows from the induction hypothesis. 

Consider now a node u at level k + 1. The subtree rooted at u is a (k + 1) -uniform tree. 
Each level k node v G C n (u) can be viewed as the root of a /c-uniform tree and Eq. (19) can 
be applied to L v ^ n . From the Markov Inequality, and since A G [—1,0], we have 

Pl ( < t ) < e -AJ„(«)t fc+ i El r e A5„Hi 
V/„(m) / 

so that 

i i„ gPl (^< tt+1 ) 

<-At fc+1 + -i- £ log E x [e^-] 

<nW „eC„(«) 

<-At w + A 1Jfc ( 7> t«;A)+ £ T^ + T^T- 1 ( 2 °) 
= -At fc+1 + A 1>fc (7,t ( * ) ;A) + ^-l, (21) 

L n [U) 

where (20) follows from the induction hypothesis and (19). Taking the infimum over A G [—1,0] 
(cf. Lemma 2), and using (7), we obtain 

r^ logPl \7TT ^ Z - A W(7^ (fc+1) ) + #T " L 

in(w) W„(«) / Z„(u) 

A similar argument proves the result for the Type I error probability, and the proof of part (i) is 
complete. 

For part (ii), suppose that for all n > n and all v G B n , we have l n (v) > N. Note that 
ln(f) > N\B n \. Furthermore, the number of nodes at each level k > 1 is bounded by \B n \, 
which yields 

PrJj) x< n n - IJJ) h\B n \ h_ 

Uf) ~ Uf) Uf) ~ N\B n \ N' 

Applying the results from part (i), with k = h, we obtain part (ii). 
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B. Proof of Lemma 3 

We have l' n (f) = l n (f)- Furthermore, it can be shown that \B' n \ < h\B n \. Therefore, 

Vkzr N,n 

< JL^Nh(\F M , n \ + \F C M J) 

< hNq M , n + hN/M, 

where the last inequality follows from \F M , n \ < X] ln(y) and \F^ n \ < l n (f)/M. Taking the 

v£F M , n 

limit superior as n — > oo, we obtain 

q' N <h(Nq M + N/M). 
Suppose that q M = for all M > 0. Then for all AT, M > 0, we have 

q' N < hN/M. 
Taking M — > oo, we obtain the desired result. 

C. Proof of Lemma 4 

Suppose that q N > for some iV > 0. Using the inequality 



<?JV,n = 7777 > t n (u) < 



or 



we obtain 



|i^,„| > W), (22) 



«»(/) < W) 



n " |Fjv, n | + L(f) 

W) 



< 



qN,Jn(f)/ N + Uf) 

N 



N + q N>r 



Letting n — > 00, we obtain 



iV 

2 ^ < L 

N + q N 
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For the converse, suppose that q N = for all N > 0. It can be seen that each non-leaf node 
is on a path that connects some v G B n to the fusion center. Therefore, the number of non-leaf 
nodes n — l n (f) is bounded by h\B n \. We have 

n - l n (f) . h\B n \ _ \F N , n \ + l-F^J h 

Therefore, 

,. n-l n (f) h 
This is true for all N > 0, which implies that lim l n (f)/n = 1. 

n— >oo 

D. Proof of Lemma 6 

For each v G B n , we have Y v = 7t,({7«(X M ) : u G C n (v)}), for some 7^ G T(l n (v)). Using 
the first, and the second part of Lemma 1, there exists some a\ G (0, 00), such that 

dP?" \ 2i 

lOg 

«£C„(u) 



Eo[L2 >B ]<Eo[( 1o S 



D7« 




+ 1 



<l n {v)E Q [ l0 § 2 



Civ. , 



dP^ 

72/ 



+ 1 



< *„0>)ai + 1 

< (23) 

where a — ai + 1. 

To prove the lemma, we use Chebychev's inequality, and the inequalities l n (v) < N for 

v G F N , n , and \D n \ < l n (f), to obtain 

Sn(f) 



<^ m (E E »[ |o « 2 §& + E E «KJ) 



< 



v£D n v£F N:n 

l n (v) 



< " (25) 
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where (24) follows from Lemma 1 and (23). The R.H.S. of (25) goes to zero as n — > oo, and 
the proof is complete. 
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